SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification
نویسندگان
چکیده
A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method.
منابع مشابه
Imputation of Microsatellite Alleles from Dense SNP Genotypes for Parental Verification
Microsatellite (MS) markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP)-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unle...
متن کاملUtilization of a 17 Microsatellites Set For Bovine Traceability in Czech Cattle Populations
For identification of individuals and parentage control performed by cattle breeders in the Czech Republic, a novel Finnish Bovine Genotypes™ Panel 3.1was amplified by means of one multiplex polymerase chain reaction. Bovine Panel encompasses all the 12 STR loci recommended by the International Society for Animal Genetics (ISAG) for routine use in parentage testing and identification, including...
متن کاملDetection of Single-Nucleotide Polymorphism in the Bovine AGPAT6 Gene Associated with Milk Fat Content using Tetra-Primer ARMS PCR-based Assay in Karan Fries Breeding Bulls
Background: The bovine AGPAT6 gene is one of the potential candidate genes governing milk fat synthesis.Objectives: Identification of single nucleotide polymorphisms (SNP) in the targeted region of AGPAT6 gene and their effect on expected breeding values (EBV) of first lactation milk production traits viz. fat %, fat yield and 305 days milk yield in Karan Fries (KF) breeding bulls were so...
متن کاملLivestock Farming Systems and Cattle Production Orientation in Eastern High Plains of Algeria, Cattle Farming System in Algerian Semi-Arid Region
This study was an attempt to devise productive orientations of cattle herds in eastern high plains of Algeria. In this regard, 165 farms randomly identified were investigated. The selection of breeders was based to existence of cattle on the farm, and the farmer proposed to investigation must have at least two cows. The approach taken was to identify all systems adopted by farmers in a region t...
متن کاملUsing Genomic Data to Improve Dairy Cattle Genetic Evaluations
Genomic data Genotypes for about 40,000 single nucleotide polymorphisms (SNP) now act as a third source of data for national genetic evaluations of dairy cattle, in addition to phenotypes and pedigrees that were the basis of selection for the previous 100 years. Rapid developments in genotyping tools have lowered the cost of obtaining this genomic data to just over $200 per animal. The Illumina...
متن کامل